This R Notebook analyzes CMS Nursing Home data, focusing on New England states and Massachusetts counties. We will explore the average nursing home ratings and visualize them across counties and states.
Note: The county map is the main reason I chose R
over Python for this study, as R provides easier tools for geographic
data manipulation and visualization using packages like
tigris and sf.
suppressWarnings({
library(httr)
library(jsonlite)
library(tidyverse)
library(tigris)
library(sf)
library(curl)
library(plotly)
})
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
##
## Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
##
## Using libcurl 8.10.1 with Schannel
##
##
## Attaching package: 'curl'
##
##
## The following object is masked from 'package:readr':
##
## parse_date
##
##
## The following object is masked from 'package:httr':
##
## handle_reset
##
##
##
## Attaching package: 'plotly'
##
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
##
## The following object is masked from 'package:httr':
##
## config
##
##
## The following object is masked from 'package:stats':
##
## filter
##
##
## The following object is masked from 'package:graphics':
##
## layout
dataset_id <- "4pq5-n9py"
url <- paste0("https://data.cms.gov/provider-data/api/1/metastore/schemas/dataset/items/", dataset_id)
response <- GET(url)
if (status_code(response) == 200) {
dataset_metadata <- content(response, "parsed")
download_url <- dataset_metadata$distribution[[1]]$downloadURL
} else {
print(paste("Error fetching dataset metadata:", status_code(response)))
stop()
}
df <- read.csv(download_url)
df_clean <- df %>% filter(!is.na(State) & !is.na(`Overall.Rating`))
##Filtering New England The ratings (below) indicate that, on average, Rhode Island has the highest overall rating for nursing homes among the New England states, followed by Maine and New Hampshire. Massachusetts has the lowest average rating in this comparison.
new_england_states <- c('ME', 'NH', 'VT', 'MA', 'RI', 'CT')
df_new_england <- df_clean %>% filter(State %in% new_england_states)
df_new_england_grouped <- df_new_england %>%
group_by(State) %>%
summarise(Average.Rating = mean(`Overall.Rating`, na.rm = TRUE)) %>%
arrange(desc(Average.Rating))
print(df_new_england_grouped)
## # A tibble: 6 × 2
## State Average.Rating
## <chr> <dbl>
## 1 RI 2.96
## 2 ME 2.91
## 3 NH 2.90
## 4 CT 2.89
## 5 VT 2.88
## 6 MA 2.87
ma_counties <- invisible(tigris::counties(state = "MA", cb = TRUE))
## Retrieving data for the year 2022
## | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |========= | 13% | |========== | 14% | |=========== | 15% | |=========== | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================= | 34% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================ | 41% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |=================================== | 49% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================ | 64% | |============================================= | 64% | |============================================= | 65% | |============================================== | 66% | |=============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100%
if (exists("ma_counties")) {
print("ma_counties has been loaded successfully.")
} else {
stop("ma_counties could not be loaded. Please check the tigris package.")
}
## [1] "ma_counties has been loaded successfully."
df_ma_counties_summary <- df_clean %>%
dplyr::filter(State == "MA") %>%
dplyr::group_by(County.Parish) %>%
dplyr::summarise(Average.Rating = mean(Overall.Rating, na.rm = TRUE))
ma_counties_ratings <- left_join(ma_counties, df_ma_counties_summary, by = c("NAME" = "County.Parish"))
map <- ggplot(ma_counties_ratings) +
geom_sf(aes(fill = Average.Rating,
text = paste("County: ", NAME, "<br>Average Rating: ", round(Average.Rating, 2))),
color = "white", size = 0.2) +
scale_fill_viridis_c() +
theme_minimal() +
labs(title = "Massachusetts County Ratings", fill = "Average Rating") +
theme(axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank())
## Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat = stat,
## : Ignoring unknown aesthetics: text
interactive_map <- ggplotly(map, tooltip = "text")
interactive_map
Based on the data, Dukes County has the highest overall rating, but it is important to note that it only has one facility. This means that the rating might be influenced by the performance of a single facility, so it might not reflect a broader trend across multiple facilities. It would be helpful to consider counties with more facilities for a more reliable comparison of the nursing home ratings in Massachusetts.
df_ma_counties <- df_clean %>% filter(State == "MA")
county_facility_count <- df_ma_counties %>%
group_by(County.Parish) %>%
summarise(Number_of_Facilities = n()) %>%
arrange(desc(Number_of_Facilities))
print(county_facility_count)
## # A tibble: 14 × 2
## County.Parish Number_of_Facilities
## <chr> <int>
## 1 Middlesex 72
## 2 Worcester 50
## 3 Essex 47
## 4 Norfolk 34
## 5 Bristol 27
## 6 Plymouth 26
## 7 Hampden 25
## 8 Suffolk 23
## 9 Barnstable 15
## 10 Berkshire 13
## 11 Hampshire 6
## 12 Franklin 3
## 13 Dukes 1
## 14 Nantucket 1
The following plot displays the top-ranking states/territories for nursing home ratings. Hawaii (HI), Puerto Rico (PR), and Alaska (AK) lead the way, while Georgia (GA), Louisiana (LA), and Guam (GU) have the lowest ratings.
df_state_avg <- df_clean %>%
group_by(State) %>%
summarise(Average.Rating = mean(`Overall.Rating`, na.rm = TRUE))
plot <- ggplot(df_state_avg, aes(x = reorder(State, -Average.Rating), y = Average.Rating, text = paste("State:", State, "<br>Rating:", round(Average.Rating, 2)))) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(
title = "Top-Ranking Areas for Nursing Home Ratings",
subtitle = "",
x = "State/Territory",
y = "Average Overall Rating",
caption = "Source: CMS"
) +
theme(
axis.text.x = element_text(angle = 90, hjust = 1),
plot.margin = margin(t = 20, b = 40, r = 30, l = 30),
plot.title = element_text(hjust = 0.5, size = 16),
plot.subtitle = element_text(hjust = 0.5, face = "italic", size = 12),
plot.caption = element_text(hjust = 0.5, size = 8, face = "italic"),
plot.title.position = "plot"
)
interactive_plot <- ggplotly(plot, tooltip = "text")
interactive_plot
However, there are several areas for future work that could enhance our understanding of the factors driving these ratings:
Staffing Analysis 🧑: Investigating the relationship between staffing levels (e.g., Reported.Nurse.Aide.Staffing.Hours.per.Resident.per.Day, Reported.RN.Staffing.Hours.per.Resident.per.Day, Total.number.of.nurse.staff.hours.per.resident.per.day.on.the.weekend) and the overall rating (Overall.Rating) could help identify how staffing impacts care quality.
Health Inspections 🔍: Exploring how health inspection ratings (Health.Inspection.Rating) correlate with nursing home ratings (Overall.Rating). A deeper dive into Most.Recent.Health.Inspection.More.Than.2.Years.Ago might provide insights into the influence of recent inspections on ratings.
Ownership Type: Analyzing how ownership type (Ownership.Type) affects nursing home ratings. A comparison of for-profit and non-profit facilities’ ratings could uncover patterns in care quality based on facility ownership.
Facility Size 🏠: Investigating the relationship between the number of certified beds (Number.of.Certified.Beds) and nursing home ratings (Overall.Rating). Larger facilities may have different challenges compared to smaller ones.
Health Deficiencies: Analyzing the impact of health deficiencies (Rating.Cycle.1.Number.of.Standard.Health.Deficiencies, Rating.Cycle.2.Number.of.Standard.Health.Deficiencies) on overall ratings (Overall.Rating) to understand how deficiencies contribute to lower ratings.
Staffing Turnover 📉: Investigating staffing turnover (Total.nursing.staff.turnover, Registered.Nurse.turnover) and its correlation with the nursing home ratings could offer insights into how retention issues affect care quality.